An Adversarial Analysis of the Reidentifiability of the Heritage Health Prize Dataset

نویسنده

  • Arvind Narayanan
چکیده

I analyze the reidentifiability of the Heritage Health Prize dataset taking into account the auxiliary information available online and offline to a present-day adversary. A key technique is identifying providers, which is useful both as an end in itself and as a stepping stone towards identifying members. My primary findings are: 1. Grouping providers based on shared members results in the formation of clusters which likely correspond to hospitals; 2. There is enough auxiliary information to identify most of these hospitals, and possibly also individual providers; 3. An adversary who has detailed information about a member’s health conditions will be able to uniquely identify him or her; 4. While there are numerous websites where users can share reviews, health conditions, etc., their adoption is not currently high enough to serve as a source of auxiliary information for a large-scale member-reidentification attack. I provide bounds on the efficacy of the methods I describe, but time constraints prevented me from attempting a more complete attack. To the best of my judgment, reidentification is within the realm of possibility; however, it is far from straightforward and will require algorithmic sophistication as well as sleuthing for auxiliary data. While identification of providers might be useful to contestants for improving predictive performance, large-scale reidentification of members—that has the potential to pose a threat to privacy and to the fidelity of the contest—appears unlikely to be feasible due to the paucity of auxiliary information. ∗e-mail: [email protected]; web: http://randomwalker.info/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

De-identification Methods for Open Health Data: The Case of the Heritage Health Prize Claims Dataset

BACKGROUND There are many benefits to open datasets. However, privacy concerns have hampered the widespread creation of open health data. There is a dearth of documented methods and case studies for the creation of public-use health data. We describe a new methodology for creating a longitudinal public health dataset in the context of the Heritage Health Prize (HHP). The HHP is a global data mi...

متن کامل

Turn-Taking, Preference, and Face in Criticism Responses

Vivas have multiple functions in academia, but their main goal is completing thesis evaluation. At the heart of this evaluation is a series of criticisms and their responsive turns by which participants talk vivas as institution into being (Heritage, 1997). Turn-taking is one of the many ways vivas are talked into being. This study drew upon conversation analysis to look into the turn allocatio...

متن کامل

Robust Opponent Modeling in Real-Time Strategy Games using Bayesian Networks

Opponent modeling is a key challenge in Real-Time Strategy (RTS) games as the environment is adversarial in these games, and the player cannot predict the future actions of her opponent. Additionally, the environment is partially observable due to the fog of war. In this paper, we propose an opponent model which is robust to the observation noise existing due to the fog of war. In order to cope...

متن کامل

Exploration and analyzing the value of Industrial architectural heritage conservation, Case study: Cement Factory of Shahr-e-Rey

Abstract: As a cultural heritage, industrial buildings are prone to various changes and transformations to the extent that they are practically susceptible to complete destruction. Accordingly, such spaces which should be considered lively places have unfortunately turned into the symbols of urban disorder. also, an industrial heritage covers social, economic, and cultural values. Therefore, de...

متن کامل

Automatic Colorization of Grayscale Images Using Generative Adversarial Networks

Automatic colorization of gray scale images poses a unique challenge in Information Retrieval. The goal of this field is to colorize images which have lost some color channels (such as the RGB channels or the AB channels in the LAB color space) while only having the brightness channel available, which is usually the case in a vast array of old photos and portraits. Having the ability to coloriz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014